A scalable tool architecture for diagnosing wait states in massively parallel applications
نویسندگان
چکیده
When scaling message-passing applications to thousands of processors, their performance is often affected by wait states that occur when processes fail to reach synchronization points simultaneously. As a first step in reducing the performance impact, we have shown in our earlier work that wait states can be diagnosed by searching event traces for characteristic patterns. However, our initial sequential search method did not scale beyond several hundred processes. Here, we present a scalable approach, based on a parallel replay of the target application’s communication behavior, that can efficiently identify wait states at the previously inaccessible scale of 65,536 processes and that has potential for even larger configurations. We explain how our new approach has been integrated into a comprehensive parallel tool architecture, which we use to demonstrate that wait states may consume a major fraction of the execution time at larger scales.
منابع مشابه
Scalable communication architectures for massively parallel hardware multi-processors
Modern complex embedded applications inmultiple application fields impose stringent and continuously increasing functional and parametric demands. To adequately serve these applications, massively parallel multi-processor systems on a single chip (MPSoCs) are required. This paper is devoted to the design of scalable communication architectures of massively parallel hardware multi-processors for...
متن کاملScalable Detection of MPI-2 Remote Memory Access Inefficiency Patterns
Wait states in parallel applications can be identified by scanning event traces for characteristic patterns. In our earlier work we defined such inefficiency patterns for MPI-2 one-sided communication, although still based on a serial traceanalysis scheme with limited scalability. In this article we show how wait states in one-sided communications can be detected in a more scalable fashion by t...
متن کاملAn Open Infrastructure for Scalable, Reconfigurable Analysis∗
Petascale systems will have hundreds of thousands of processor cores so their applications must be massively parallel. Effective use of petascale systems will require efficient interprocess communication through memory hierarchies and complex network topologies. Tools to collect and analyze detailed data about this communication would facilitate its optimization. However, several factors compli...
متن کاملThe Outlook for Scalable Parallel Processing
The commercial and technical markets are fundamentally different. Massively parallel processors may be more useful for commercial applications because of the parallelism implicit in accessing a database through multiple, independent transactions. Ease of programming will be the principal factor that determines how rapidly this class of computer architecture will penetrate the generalpurpose com...
متن کاملA Review of Surface-Enhanced Raman Spectroscopy on Potential Clinical Applications Towards Diagnosing Colorectal Cancer
Colorectal cancer (CRC) is one of the leading cancers in the world and early-screening is still the best method of cancer patient survival. However, colonoscopy as the current gold standard is not without flaws and an emerging technique called surface-enhanced Raman spectroscopy (SERS) coupled with machine learning is a possible candidate that could be applied in parallel with colonoscopy. This...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Parallel Computing
دوره 35 شماره
صفحات -
تاریخ انتشار 2009